Federated learning (FL) is a promising approach to enable the future Internet of vehicles consisting of intelligent connected vehicles (ICVs) with powerful sensing, computing and communication capabilities. We consider a base station (BS) coordinating nearby ICVs to train a neural network in a collaborative yet distributed manner, in order to limit data traffic and privacy leakage. However, due to the mobility of vehicles, the connections between the BS and ICVs are short-lived, which affects the resource utilization of ICVs, and thus, the convergence speed of the training process. In this paper, we propose an accelerated FL-ICV framework, by optimizing the duration of each training round and the number of local iterations, for better convergence performance of FL. We propose a mobility-aware optimization algorithm called MOB-FL, which aims at maximizing the resource utilization of ICVs under short-lived wireless connections, so as to increase the convergence speed. Simulation results based on the beam selection and the trajectory prediction tasks verify the effectiveness of the proposed solution.
translated by 谷歌翻译
车辆到所有(V2X)网络已使自主驾驶中的合作感达到了协作感,这是对独立情报的根本缺陷的有前途的解决方案,包括盲区和远距离感知。但是,缺乏数据集严重阻碍了协作感知算法的发展。在这项工作中,我们发布了海豚:用于协作感知的数据集,可以使和谐且相互联系的自动驾驶,这是一个新的模拟大规模的各种大规模的各种赛车多模式多模式自动驾驶数据集,该数据集为互连为互连的开创性基准平台提供自动驾驶。海豚在六个维度上优于当前数据集:从车辆和道路侧单元(RSU)(RSUS)的临时图像和点云,启用车辆到车辆(V2V)和车辆到基础设施(V2I)的协作感知; 6具有动态天气条件的典型场景使各种互连的自动驾驶数据集最多;精心选择的观点,提供关键区域和每个对象的全部覆盖范围; 42376帧和292549个对象,以及相应的3D注释,地理位置和校准,构成了最大的协作知觉数据集;全高清图像和64线激光雷达构建高分辨率数据,并具有足够的详细信息;组织良好的API和开源代码可确保海豚的可扩展性。我们还构建了2D检测,3D检测和关于海豚的多视图协作任务的基准。实验结果表明,通过V2X通信的原始融合方案可以帮助提高精度,并在RSU存在时减少昂贵的LiDAR设备的必要性,这可能会加速相互联系的自动驾驶车辆的普及。现在可以在https://dolphins-dataset.net/上获得海豚。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer search framework with Online distillation, namely OVO. OVO samples sub-nets for both teacher and student networks for better distillation results. Benefiting from the online distillation, thousands of subnets in the supernet are well-trained without extra finetuning or retraining. In experiments, OVO-Ti achieves 73.32% top-1 accuracy on ImageNet and 75.2% on CIFAR-100, respectively.
translated by 谷歌翻译
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and quantify the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Gaussian process, which guides the search in the optimization process. The critical challenge for designing Bayesian optimization algorithms on manifolds lies in the difficulty of constructing valid covariance kernels for Gaussian processes on general manifolds. Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensional Euclidean space via equivariant embeddings and then constructing a valid covariance kernel on the image manifold after the embedding. This leads to efficient and scalable algorithms for optimization over complex manifolds. Simulation study and real data analysis are carried out to demonstrate the utilities of our eBO framework by applying the eBO to various optimization problems over manifolds such as the sphere, the Grassmannian, and the manifold of positive definite matrices.
translated by 谷歌翻译
Optical flow, which computes the apparent motion from a pair of video frames, is a critical tool for scene motion estimation. Correlation volume is the central component of optical flow computational neural models. It estimates the pairwise matching costs between cross-frame features, and is then used to decode optical flow. However, traditional correlation volume is frequently noisy, outlier-prone, and sensitive to motion blur. We observe that, although the recent RAFT algorithm also adopts the traditional correlation volume, its additional context encoder provides semantically representative features to the flow decoder, implicitly compensating for the deficiency of the correlation volume. However, the benefits of this context encoder has been barely discussed or exploited. In this paper, we first investigate the functionality of RAFT's context encoder, then propose a new Context Guided Correlation Volume (CGCV) via gating and lifting schemes. CGCV can be universally integrated with RAFT-based flow computation methods for enhanced performance, especially effective in the presence of motion blur, de-focus blur and atmospheric effects. By incorporating the proposed CGCV with previous Global Motion Aggregation (GMA) method, at a minor cost of 0.5% extra parameters, the rank of GMA is lifted by 23 places on KITTI 2015 Leader Board, and 3 places on Sintel Leader Board. Moreover, at a similar model size, our correlation volume achieves competitive or superior performance to state of the art peer supervised models that employ Transformers or Graph Reasoning, as verified by extensive experiments.
translated by 谷歌翻译
Image harmonization aims to produce visually harmonious composite images by adjusting the foreground appearance to be compatible with the background. When the composite image has photographic foreground and painterly background, the task is called painterly image harmonization. There are only few works on this task, which are either time-consuming or weak in generating well-harmonized results. In this work, we propose a novel painterly harmonization network consisting of a dual-domain generator and a dual-domain discriminator, which harmonizes the composite image in both spatial domain and frequency domain. The dual-domain generator performs harmonization by using AdaIn modules in the spatial domain and our proposed ResFFT modules in the frequency domain. The dual-domain discriminator attempts to distinguish the inharmonious patches based on the spatial feature and frequency feature of each patch, which can enhance the ability of generator in an adversarial manner. Extensive experiments on the benchmark dataset show the effectiveness of our method. Our code and model are available at https://github.com/bcmi/PHDNet-Painterly-Image-Harmonization.
translated by 谷歌翻译
Automatic defect detection for 3D printing processes, which shares many characteristics with change detection problems, is a vital step for quality control of 3D printed products. However, there are some critical challenges in the current state of practice. First, existing methods for computer vision-based process monitoring typically work well only under specific camera viewpoints and lighting situations, requiring expensive pre-processing, alignment, and camera setups. Second, many defect detection techniques are specific to pre-defined defect patterns and/or print schematics. In this work, we approach the automatic defect detection problem differently using a novel Semi-Siamese deep learning model that directly compares a reference schematic of the desired print and a camera image of the achieved print. The model then solves an image segmentation problem, identifying the locations of defects with respect to the reference frame. Unlike most change detection problems, our model is specially developed to handle images coming from different domains and is robust against perturbations in the imaging setup such as camera angle and illumination. Defect localization predictions were made in 2.75 seconds per layer using a standard MacBookPro, which is comparable to the typical tens of seconds or less for printing a single layer on an inkjet-based 3D printer, while achieving an F1-score of more than 0.9.
translated by 谷歌翻译
As a neural network compression technique, post-training quantization (PTQ) transforms a pre-trained model into a quantized model using a lower-precision data type. However, the prediction accuracy will decrease because of the quantization noise, especially in extremely low-bit settings. How to determine the appropriate quantization parameters (e.g., scaling factors and rounding of weights) is the main problem facing now. Many existing methods determine the quantization parameters by minimizing the distance between features before and after quantization. Using this distance as the metric to optimize the quantization parameters only considers local information. We analyze the problem of minimizing local metrics and indicate that it would not result in optimal quantization parameters. Furthermore, the quantized model suffers from overfitting due to the small number of calibration samples in PTQ. In this paper, we propose PD-Quant to solve the problems. PD-Quant uses the information of differences between network prediction before and after quantization to determine the quantization parameters. To mitigate the overfitting problem, PD-Quant adjusts the distribution of activations in PTQ. Experiments show that PD-Quant leads to better quantization parameters and improves the prediction accuracy of quantized models, especially in low-bit settings. For example, PD-Quant pushes the accuracy of ResNet-18 up to 53.08% and RegNetX-600MF up to 40.92% in weight 2-bit activation 2-bit. The code will be released at https://github.com/hustvl/PD-Quant.
translated by 谷歌翻译
In the presence of noisy labels, designing robust loss functions is critical for securing the generalization performance of deep neural networks. Cross Entropy (CE) loss has been shown to be not robust to noisy labels due to its unboundedness. To alleviate this issue, existing works typically design specialized robust losses with the symmetric condition, which usually lead to the underfitting issue. In this paper, our key idea is to induce a loss bound at the logit level, thus universally enhancing the noise robustness of existing losses. Specifically, we propose logit clipping (LogitClip), which clamps the norm of the logit vector to ensure that it is upper bounded by a constant. In this manner, CE loss equipped with our LogitClip method is effectively bounded, mitigating the overfitting to examples with noisy labels. Moreover, we present theoretical analyses to certify the noise-tolerant ability of LogitClip. Extensive experiments show that LogitClip not only significantly improves the noise robustness of CE loss, but also broadly enhances the generalization performance of popular robust losses.
translated by 谷歌翻译